Higher Criticism for Detecting Sparse Heterogeneous Mixtures

نویسندگان

  • David Donoho
  • Jiashun Jin
چکیده

Higher Criticism, or second-level significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the fraction of observed significances at a given α-level to the expected fraction under the joint null, in fact he suggested to standardize the difference of the two quantities and form a z-score; the resulting z-score tests the significance of the body of significance tests. We consider a generalization, where we maximize this z-score over a range of significance levels 0 < α ≤ α0. We are able to show that the resulting Higher Criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ‘sparse normal means’ testing problem can be seen from work of Ingster (1999) and Jin (2002), who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so small that the alternative hypothesis exhibits little noticeable effect on the distribution on the p-values either for the bulk of the tests or for the few most highly significant tests. In this range, when the amplitude of nonzero means is calibrated with the fraction of nonzero means, the likelihood ratio test for a precisely-specified alternative would still succeed in separating the two hypotheses. We show that the Higher Criticism is successful throughout the same region of amplitude vs. sparsity where the likelihood ratio test would succeed. Since it does not require a specification of the alternative, this shows that Higher Criticism is in a sense optimally adaptive to unknown sparsity and size of the non-null effects. While our theoretical work is largely asymptotic, we provide simulations in finite samples, and suggest some possible applications. We also show that Higher Critcism works well over a range of nonGaussian cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal detection of heterogeneous and heteroscedastic mixtures

The problem of detecting heterogeneous and heteroscedastic Gaussian mixtures is considered. The focus is on how the parameters of heterogeneity, heteroscedasticity, and proportion of non-null component influence the difficulty of the problem. We establish an explicit detection boundary which separates the detectable region where the likelihood ratio test is shown to reliably detect the presence...

متن کامل

Gene-based Higher Criticism methods for large-scale exonic single-nucleotide polymorphism data

In genome-wide association studies, gene-based methods measure potential joint genetic effects of loci within genes and are promising for detecting causative genetic variations. Following recent theoretical research in statistical multiple-hypothesis testing, we propose to adapt the Higher Criticism procedures to develop novel gene-based methods that use the information of linkage disequilibriu...

متن کامل

Optimal Detection For Sparse Mixtures

Detection of sparse signals arises in a wide range of modern scientific studies. The focus so far has been mainly on Gaussian mixture models. In this paper, we consider the detection problem under a general sparse mixture model and obtain an explicit expression for the detection boundary. It is shown that the fundamental limits of detection is governed by the behavior of the log-likelihood rati...

متن کامل

Innovated Higher Criticism for Detecting Sparse Signals in Correlated Noise

Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting the nature of the correlation, performance can be improved by using a modified approach which expl...

متن کامل

Higher Criticism for Detecting Sparse Heterogeneous Mixtures1 by David Donoho

Higher criticism, or second-level significance testing, is a multiplecomparisons concept mentioned in passing by Tukey. It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested comparing the fraction of observed significances at a given α-level to the expected fraction under the joint null. In fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003